|
|
Automatic Determination of Clustering Center for Clustering by Fast Search and Find of Density Peaks |
WANG Wanliang1, WU Fei1, LÜ Chuang1 |
1.College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023 |
|
|
Abstract Clustering center cannot be automatically selected by the algorithm of fast search and find of density peaks. To solve the problem, automatic determination of clustering centers for clustering by fast search and find of density peaks is proposed. Firstly, density and distance are normalized for the problem of uneven distribution of variables, and then the upper limit of normalized density threshold is determined by Chebyshev inequality. Standard deviation is utilized to determine the upper limit of normalized distance threshold. Finally, the upper limit of decision threshold is determined according to the decision function. Two determinants are considered comprehensively to avoid the omission of the central point selection and realize the automatic determination of the cluster centers. The experiment shows that the adaptive selection of the clustering centers of the proposed algorithm is effective with good robustness and validity.
|
Received: 28 March 2019
|
|
Fund:Supported by National Natural Science Foundation of China(No.61873240) |
Corresponding Authors:
WANG Wanliang, Ph.D., professor. His research interests include deep learning, artificial intelligence and big data.
|
About author:: WU Fei, master student. Her research interests include big data and data mining.LÜ Chuang, master student. His research interests include big data and data mining. |
|
|
|
[1] QIAO S J, HAN N, ZHANG K F, et al. Algorithm for Detecting Overlapping Communities from Complex Network Big Data. Software, 2016, 28(3): 1-16. [2] MORRIS K, MCNICHOLAS P D. Clustering, Classification, Discriminant Analysis, and Dimension Reduction via Generalized Hyperbolic Mixtures. Computational Statistics and Data Analysis, 2016, 97: 133-150. [3] MAKADIA P P. Survey on Clustering Algorithms. IEEE Transactions on Neural Networks, 2014, 2(7): 105-109. [4] QI J P, YU Y W, WANG Y W, et al. An Effective and Efficient Hierarchical K-means Clustering Algorithm. International Journal of Distributed Sensor Networks, 2017, 13(8): 15-19. [5] KAUFMAN L, ROUSSEEUW P. Clustering by Means of Medoids // KAU-FMAN L, ROUSSEEUU P J, eds. Statistical Data Analysis Based on the L1 Norm and Related Methods. North Holland, The Netherland: North-Holland Press, 1987: 405-416. [6] 薄 华,马缚龙,焦李成.基于免疫K-means聚类的无监督SAR图像分割.模式识别与人工智能, 2008, 21(3): 376-380 (BO H, MA B L, JIAO L C. Unsupervised SAR Image Segmentation Based on Immune K-means Clustering. Pattern Recognition and Artificial Intelligence, 2008, 21(3): 376-380.) [7] ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An Efficient Data Clustering Method for Very Large Databases // Proc of the ACMSIGMOD International Conference on Management of Data. New York, USA: ACM, 1996: 103-114. [8] MA L, FAN S H. CURE-SMOTE Algorithm and Hybrid Algorithm for Feature Selection and Parameter Optimization Based on Random Forests. BMC Bioinformatics, 2017, 18: 1-18. [9] 黄 兴,刘小青,曹步清,等.融合K-means与Agnes的Mashup服务聚类方法.小型微型计算机系统, 2015, 36(11): 2492-2497. (HUANG X, LIU X Q, CAO B Q, et al. MSCA: Mashup Service Clustering Approach Integrating K-means and Agnes Algorithms. Journal of Chinese Computer Systems, 2015, 36(11): 2492-2497.) [10] LI Z J, TANG Y C. Comparative Density Peaks Clustering. Expert Systems with Applications, 2017, 95: 236-247. [11] PARMAR M, WANG D, ZHANG X F, et al. REDPC: A Residual Error-Based Density Peak Clustering Algorithm. Neurocomputing, 2019, 348: 82-96. [12] XU X, DING S E, SHI Z Z. An Improved Density Peaks Clustering Algorithm with Fast Finding Cluster Centers. Knowledge-Based Systems, 2018, 158: 65-74. [13] DENG C, SONG J W, SUN R Z, et al. GRIDEN: An Effective Grid-Based and Density-Based Spatial Clustering Algorithm to Support Parallel Computing. Pattern Recognition Letters, 2018, 109: 81-88. [14] DONG S Q, LIN J J, LIU Y H, et al. Clustering Based on Grid and Local Density with Priority-Based Expansion for Multi-density Data. Information Sciences, 2018, 468: 103-116. [15] ZHAO Q P, SHI Y, LIU Q, et al. A Grid-Growing Clustering Algorithm for Geo-Spatial Data. Pattern Recognition Letters, 2015, 53: 77-84. [16] 王铭坤,袁少光,朱永利,等.基于Storm的海量数据实时聚类.计算机应用, 2014, 34(11): 3078-3081. (WANG M K, YUAN S G, ZHU Y L, et al. Real-Time Clustering for Massive Data Using Storm. Journal of Computer Applications, 2014, 34(11): 3078-3081.) [17] FAHAD A, ALSHATRI N, TARI Z, et al. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279. [18] AYED A B, HALIMA M B, ALIMI A M. Survey on Clustering Methods: Towards Fuzzy Clustering for Big Data // Proc of the 6th International Conference on Soft Computing and Pattern Recognition. Washington, USA: IEEE, 2014: 331-336. [19] 朱 杰,陈黎飞.核密度估计的聚类算法.模式识别与人工智能,2017, 30(5): 439-447. (ZHU J, CHEN L F. Clustering Algorithm with Kernel Density Estimation. Pattern Recognition and Artificial Intelligence, 2017, 30(5): 439-447.) [20] FISHER D. Improving Inherence through Conceptual Clustering // Proc of the 6th National Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 1987: 461-465. [21] RODRIGUEZ A, LAIO A. Clustering by Fast Search and Find of Density Peaks. Science, 2014, 344(6191): 1492-1496. [22] BIE R F, MEHMOOD R, RUAN S S, et al. Adaptive Fuzzy Clustering by Fast Search and Find of Density Peaks. Personal and Ubiquitous Computing, 2016, 20(5): 785-793. [23] DING J J, HE X X, YUAN J Q, et al. Automatic Clustering Based on Density Peak Detection Using Generalized Extreme Value Distribution. Soft Computing, 2018, 22(9): 2777-2796. [24] WANG J L, ZHANG Y, LAN X. Automatic Cluster Number Selection by Finding Density Peaks // Proc of the 2nd IEEE International Conference on Computer and Communications. Washington, USA: IEEE, 2017: 13-18. [25] LIU Y H, MA Z M, YU F. Adaptive Density Peak Clustering Ba-sed on K-nearest Neighbors with Aggregating Strategy. Knowledge-Based Systems, 2017, 133: 208-220. [26] DING J J, CHEN Z J, HE X X, et al. Clustering by Finding DensityPeaksBased on Chebyshev'sInequality // Proc of the 35th Chinese Control Conference. Washington, USA: IEEE, 2016: 7169-7172. [27] MEHMOOD R, BIE R F, DAWOOD H, et al. Fuzzy Clustering by Fast Search and Find of Density Peaks // Proc of the International Conference on Identification, Information, and Knowledge in the Internet of Things. Washington, USA: IEEE, 2015: 258-261. |
|
|
|